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I. INTRODUCTION 



The use of fiber reinforced composite materials in load bearing 
structures has many advantages over conventional materials such as steel and 
aluminum. The high stiffness to weight ratio, strength to weight ratio, and 
resistance to corrosion are among the more commonly known advantages. 
Perhaps the most significant, but not widely known benefit of using 
composites is the redundancy intrinsic in the material. The redundancy 
results from the load being shared among a multitude of fibers imbedded in 
a binding matrix material. However, in order to realize the benefit of 
increased redundancy, it must be quantified in terms of reliability; i.e. the 
mathematical model and the parameters specific to the composite require 
determination. 

As with any structural design, it is imperative that reliability 
considerations be incorporated in the engineering design of a composite. 
Depending on the application, this may take the form of either functional 
reliability or human-safe reliability. Functional reliability refers to the level 
of assurance that the system under consideration will function within the 
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specified design parameters over the design life. If the life or injury of a 
human being is at stake, the issue becomes one of human-safe reliability. 
In either case, a description is required of how the material used in the 
design will behave under a given stress over time. This description is not 
possible without empirical data of the material in both strength and life. 

Strength and life data for conventional materials can be located in many 
material and design reference books. The reason for this is that those 
materials have been in use for so many years that the accumulated data is 
sufficient to adequately describe the relevant failure processes with a 
parametric model. This is not the case in the use of the composite materials, 
for numerous reasons. Composites have a relatively short history of use, and 
when utilized, the applications are typically considered high technology. 
High technology applications normally do not produce large numbers of 
sample sets. Being on the cutting edge means limited lead time, which 
further complicates the problem of reliability characterization. This limited 
base of failure data requires that experiments be performed in order to make 
it possible to understand how the composite behaves in time under load and 
to estimate the associated parameters of the relevant probability model. 
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Experiments, however, can be extremely costly in terms of both 
resources and time. In particular, life experiments present a formidable 
problem because of the large variability in life data. The large variation in 
the life data means that a substantial amount of time will pass, possibly 
hundreds of years, before all samples on test fail. Surely the designer is not 
going to wait to actually observe even the mean life of such a material in a 
life test. The solution to this problem, and the objective of this study, is to 
design the experiment to maximize the amount of information obtained under 
constrained resources and time. In order to achieve this objective, simulation 
is required because reliability is probabilistic, involving many combinations 
of the random variables. The problem, therefore, cannot be directly cast in 
terms of classical optimization techniques. An objective function cannot be 
specified to be minimized because the functions and the parameters of the 
function are not deterministic. 

The process simulated in this work is the random nature in which actual 
materials fail in both load exceedance (strength) and time exceedance (life). 
The random sets of computer generated failure data are used to simulate the 
kinds of results an actual experiment might produce. The intended methods 
of data analysis, such as non-parametric and parametric methods, are then 
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applied to the simulated data to determine if they will be adequate to 
produce the desired level of confidence in quantifying the material reliability. 
In the parametric approach, a model is selected based on the physical failure 
process of a sample in strength or life. For the known model, the parameters 
of the model require determination. Since the parameters themselves are 
probabilistic and will never be known precisely, they can only be estimated. 
Simulation is used to determine the impact on the statistics of the estimated 
parameters when a proposed experiment procedure is executed. 

The criterion which will be used to evaluate the knowledge gain 
resulting from one method of performing an experiment compared to another 
is the change in the information of the parameters. The information of a 
parameter is quantified through the application of information theory. 
Information will be shown to be a scalar value, which can be used as an 
optimality condition in the design of the experiment. In an optimization 
sense, the design variable is the choice of when to censor an experiment 
prior to the completion of all the tests to allow the multiple use of limited 
equipment in order to maximize the objective function of information. 

The methods of optimizing information in an experiment developed in 
this study through the use of simulation are being applied to a graphite fiber 
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life experiment in progress at the Mechanics of Materials for Composite 
Reliability Laboratory of the Naval Postgraduate School, Monterey CA. 



5 



II. BACKGROUND 



The purpose of this chapter is to provide the background information 
which motivated this investigation. This chapter is divided into three 
sections: methodologies of design life in engineering, the composite failure 
mechanism, and information theory. The first section is a contrast of the 
factor-of-safety and reliability approaches used in engineering design to 
determine the design life of a component or system. The second section is 
a description of the failure mechanisms of the composite in strength or life. 
The final topic is a discussion of information theory and how this concept 
might be implemented in the design of an experiment. 

A. METHODOLOGIES OF DESIGN LIFE IN ENGINEERING 

There are two different approaches used in mechanical engineering 
design to determine the design stresses and/or life of a component or system. 
They are known as the deterministic or factor-of-safety approach and the 
reliability approach [Shigley and Mischke,!]. 
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1. The Factor-of-Safety Approach 



This approach is based on the assumptions that the applied stresses, 
strength, and life of a specimen are deterministic. Since a probabilistic 
model is not needed, this method is simple to use in the design process and 
is popular for that reason. It is applicable for materials with extended 
engineering applications where experience can be used to adjust design 
parameters. 

A common method used in the factor-of-safety approach to obtain 
the design life of a component subjected to alternating and steady stresses 
is the implementation of the stress versus number of cycles to failure curve 
(S-N curve). The S-N curve is obtained by placing test samples under 
specific stress levels and counting the number of stress reversals or cycles 
up to the sample failure. For materials whose life is sensitive to the duration 
of stress, a similar curve could be constructed for static stress or for cases 
where time rather than cycles is the random variable. This is done by having 
the abscissa represent time to failure rather than cycles to failure. 

The S-N curve provided in Figure 2.1 exhibits the features of 
typical curves obtained for many materials. The term endurance limit (S e ) 
is used to define the stress at which the slope of the S-N curve is flat or 
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High cycle 



Low cycle 



j Infinite 
| life 




FIGURE 2.1 TYPICAL S-N CURVE [SHIGLEY AND MISCHKE, 1] 
approaching zero. This infers that for a design stress with a magnitude less 
than the endurance limit, the material will have infinite life. Since the 
material will have infinite life, the design life is no longer required to be a 
design parameter and is removed from the analysis. However implicitly, 
additional experience-based modifications are needed to describe the stress 
time history. 

There are fatigue failure methods used in design which are based 
on the existence of an endurance limit partitioned into the constant stress 
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time history (mean stress) and the alternating stress time history (cyclic 
stress). Three of the more common theories are the Soderberg, Modified 
Goodman, and Gerber criteria. These failure theories can be used to 
compute a safety factor for stress for particular values of mean and 
alternating stresses in the design, or to determine the mean or alternating 
stress corresponding to a desired factor of safety in life. 

There are two major shortcomings in using the S-N curve in 
design. The first shortcoming is that in physical systems, there is no such 
thing as infinite life. The factually observed failures at a stress lower than 
the endurance limit cannot be quantitatively defined. In fact, because the 
slope of the S-N curve is approaching zero, life is not single valued in terms 
of stress; i.e. according to the S-N curve small variations in applied stress 
levels will give rise to infinite variability in life. 

The second shortcoming is that the incremental increase in safety 
or functionality which results, for example, when a safety factor of 2 is used 
instead of 1.75 cannot be quantified. Therefore, the reliability approach is 
required to rationally answer the question often posed in the design process 
as to what is the likelihood that a component under design will fail in the 
service life. 
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2. Reliability Approach 



The reliability approach is based on the formulation that the 
applied stresses, strength, and life used in design are not deterministic. The 
objective is to select the random variables salient to the failure process, and 
describe these variables by either a nonparametric or parametric distribution 
so that a probable value for the component or system reliability can be 
calculated . The random variables required to determine the reliability of a 
composite structure are the applied stress and the duration of time for which 
the stress is applied. The realizations of the random variables are the stress 
at failure of the structure (strength) and the duration of time up to failure 
(life). The failure of a structure, therefore, is represented as a joint 
probability distribution in strength and life, in lieu of the S-N curve 
described in the previous section. Pragmatic safety factors are no longer 
required because the reliability of the component is quantifiable with this 
joint distribution. 

As an illustration of why the determination of the distributions is 
important, consider schematically the joint probability of failure distribution 
for a material in Figure 2.2. In this figure, the locii of the median failures 
is a solid line representing 50% of the samples that would have failed. The 
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FIGURE 2.2 JOINT PROBABILITY DISTRIBUTION OF STRENGTH 
AND LIFE 



way in which this curve is used is that the designer will enter the graph with 
the service stress and determine the mean life corresponding to that stress. 
This point is denoted as point ’A’ in Figure 2.2. In the factor-of-safety 
approach, this mean life will be reduced by some safety factor to the design 
point labelled ’B’ in the figure, which becomes the design point. The dotted 
line closest to the median curve in Figure 2.2 represents the loci of points for 
the maximum allowable number of failure, say 0.01%, for the case where the 
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distributions of the strength and life data have small variability. With respect 
to the design point, this is a safe design, as the probability of failure is very 
low for desired stress and service life. The dotted line furthest from the 
median curve in Figure 2.2 is the loci of the points of the same maximum 
allowable number of failure (0.01) but for a distribution representing larger 
variability in the strength and life data. This situation is higher risk with 
respect to the design point, because although the design appears safe relative 
to the curve of median failures, the actual probability of failure is larger than 
the specified maximum. 

This clearly shows why the reliability approach results in a much 
safer design. The S-N curve used in the factor-of-safety approach does not 
contain any information as to how the failures are distributed about the mean 
values. This approach is useful only when a large amount of data or 
experience is available for the materials used in the design where an 
appropriate safety factor can be pragmatically chosen. In the use of 
composite materials which utilize high technology fibers such as graphite, 
the data or experience in the material is not sufficient to warrant the use of 
safety factors. The relevant data must, therefore, be obtained from 
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experiments which are designed to produce the greatest knowledge of 
parameters for describing the distribution of the failure data. 

B. COMPOSITE FAILURE MECHANISMS 
1. Strength Model 

In order to model the failure mechanism of a composite in 
strength, it is necessary to understand how the fiber and matrix constituents 
interact when the structure is stressed. A fiber-reinforced composite consists 
of many long, small diameter fibers embedded in a matrix binder material. 
The role of the fibers is to act as the load carrying members in the material 
whereas the matrix serves to transfer the load from broken to adjacent non- 
broken fibers. If the stress at a point in the structure is high enough to cause 
a weak fiber to break, the matrix transfers the load from the broken fiber to 
its surrounding intact neighbors. The majority of load transfer occurs in the 
immediate vicinity of the break and decreases as the distance from the break 
is increased. This is shown graphically in Figure 2.3. 

Rosen [2] developed a relationship quantifying the distance 
required for the matrix to transfer the stress resulting from a broken fiber to 
the surrounding fibers. This quantity is called the ineffective length, and is 
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given in equation form as follows: 
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( 2 . 1 ) 



where: V f is the volume fraction of the fiber 

E f is the modulus of the fiber 
G m is the shear modulus of the matrix 
<J> is the fractional value, called the fiber load sharing 

efficiency, below which the fiber is not considered effective. 

The transfer of stress between neighboring fibers within an ineffective length 

will occur until the neighboring fibers themselves fail because of the 

increased stress. As the load is increased, the number of fiber failures will 

increase, which will eventually lead to the clustering of fiber failure sites. 

The result is that the local stress concentration is so great that the entire 

structure catastrophically fails. 

Knowledge of how the fiber fails in strength is therefore essential 
for the description of composite failure. A fiber failure will occur at the 
statistically weakest segment of the fiber, which may or may not be located 



at a point of high stress in the structure. Since the failure of the fiber is one 
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of extreme value governed by the weakest element, the Weibull distribution 
is used to model the failure process. The cumulative distribution function 
(CDF) using the Weibull distribution is 

F(* ( ) = l-exp<(-(^) 0 ) ( 2 . 2 ) 

The probability of failure of a fiber corresponding to a particular load can 
be computed using this model once the shape a and location (3 parameters 
are known. 




FIGURE 23 TENSILE FAILURE MODEL [ROSEN, 2] 
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In order to determine the strength of the composite, it is necessary 
to address the combinatorial probability of grouping of fiber failure sites. 
The model which describes this chanced clustering of fiber breaks was 
developed by Harlow and Phoenix [3,4] and is known as the Harlow and 
Phoenix Local Load Sharing model. In this model, a recursive relationship 
is used to predict the probability of failure based on the fiber strength 
distribution, all of the possible combinations of adjacent fiber breaks, and the 
resulting stress concentrations from those breaks within an ineffective length. 
If the number of adjacent breaks exceeds a critical value k, then the structure 
will catastrophically fail. 

The Harlow-Phoenix Local Load Sharing model was modified by 
Wu and Harlow to incorporate multi-modality in the fiber strength 
distributions which occur in the regions of low and high probabilities of 
failure. These regions will be subsequently referred to as the lower tail and 
upper tail, respectively. The resulting model is known as the Tri -modal 
Local Load Sharing model. 

2. Strength and Life Model 

The time dependence of mechanical breakdown of fibers was 
pioneered by B. Coleman [5] in 1958. Coleman developed the theory of 
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breaking kinetics, which is concerned with the problem of calculating the 
probability that a fiber breaks in a given interval of time when an ensemble 
of fibers is subjected to a loading history. In this theory, he develops a 
breakdown function £2(t) which is a measure of the breakdown in small 
subdivisions of the fiber, called slabs, which occurred under a loading 
history. Coleman concluded that the failure of a fiber ensemble is described 
by a death potential \}/(Q(t)); a function of the breakdown function. The 
interpretation is that the damage \j/ of the fiber is measured by the damage 
history Q(t). 

Phoenix and Wu [6] provided an overall synthesis of strength and 
life. The CDF for the time dependent failure of a fiber, in terms of 
Coleman’s damage potential, was shown to have the form 

r 

F(t;l) = l-exp{-t|r[/ k (l(s)ds]) , tz 0 ( 2 - 3 ) 

o 

where: l(t) is the stress history 

k(-) is the breakdown rule or damage function 
\j /(•) is the shape function (death function). 

They were able to cast the damage function in terms of an algebraic form 

which, when plotted, resembles the strength-life curve shown in Figure 2.2. 
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Once the time dependent failure properties of the fibers are known, 
a Local Load Sharing model with time dependent ineffective length and 
strength may be developed. This is possible because the failure mechanism 
of the composite in life is analogous to that in strength (i.e., local load 
sharing among neighboring fibers), with the exception that both the strength 
of the fiber and the ineffective length 5, which is matrix dominated, may 
change over time. The life of the composite system is described by a 
function of either a time dependent fiber strength element or a time 
dependent ineffective length or both. 

There are currently life tests in progress for AS-4 graphite fiber 
and composite strands of the AS-4 fiber at the Mechanics of Materials for 
Composite Reliability Laboratory at NPS. The overall objective of the tests 
is to produce strength-life data in order to determine the parameters of the 
failure model and define the time dependency of the ineffective length so 
that a joint probability distribution of the composite can be formed based on 
the properties of the fiber. 
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C. INFORMATION THEORY 



Shannon [7] developed a mathematical theory which dealt with the 
statistical nature of the problem of reproducing, either exactly or 
approximately, a selected message in communication. He introduced a 
quantity H which is a measure of information, choice, or uncertainty in a 
process. Shannon called this quantity H entropy as it serves as a measure 
of disorder; and it is defined by the relationship 



H = ~ K E p, log Pi (2 - 4) 

i=i 

where: p,, p 2 , ... , p n are the probabilities of occurrence of a set of 

possible events 

K is a positive constant (choice of a unit of measure). 

Shannon was primarily concerned with the amount of choice which was 
involved in the selection of an event or how uncertain one is of the outcome. 
In this regard, he stated that H would have the property of a monotonically 
increasing function of n if all p t are equal. The entropy will increase 
because, with equally likely events, there is more choice or uncertainty. 

Lindley [8] extended Shannon’s statistical concept of information to the 
notion of information in an experiment, rather than in a message. He 
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suggests the following rule of experimentation: "perform that experiment for 
which the expected gain in information is the greatest, and continue 
experimentation until a preassigned amount of information has been 
attained." Lindley further states that the maximum information will result 
when the probability distribution of the desired parameter is concentrated on 
a single value of the parameter, and that the information is reduced when 
there is any uncertainty in the value of the parameter. He defines 
information I to be 

/ = j p(0) log /? (0 ) J0 (2*5) 

Note that the form of this equation for information is similar in form to 
Shannon’s equation of entropy (Eq 2.4) with the exception of a lack of the 
minus sign. This minus sign was omitted because of the major differences 
in the goals of a person conducting an experiment and a person concerned 
with the choices in messages. The communication engineer is more 
interested in maximizing the choice in messages vice having concentration 
of the distribution on a single value. Hence, the negative in Shannon’s 
expression of entropy. The objective of performing the experiment, however, 
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is to reduce the uncertainty or increase the information of the parameters, so 
the negative sign is not utilized in Eq 2.5. 

The interest of this investigation is to maximize the knowledge of the 
parameters of an experiment for composite reliability characterization, under 
the constraints of equipment and time. The computation of the information 
/ established in Eq 2.5 will allow a measure to be made of how well the 
parameters of the model are known and measure the amount of increase in 
information is achieved by testing more samples. In this context, the 
information can be used as an optimality condition in experiment design. 
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III. QUANTIFICATION OF RELIABILITY 



As stated in the introduction, the quantification of reliability requires 
experimental data. The often high cost of experiments mandates that the 
experimental procedure and method of data analysis be selected to glean the 
most information. Once the data is obtained, the resulting reliability can be 
quantified using either a non-parametric or parametric approach. 
Regardless of the preferred approach, the goal is to determine a range of the 
random variable which can be utilized within a specified reliability level. 

This chapter is dedicated to evaluating the advantages and disadvantages 
of non-parametric and parametric approaches in the characterization of 
reliability. 

A. NON-PARAMETRIC RELIABILITY CHARACTERIZATION 

Non-parametric methods of characterizing reliability are based solely on 
the data obtained, without the analytical modeling of underlying failure 
processes. As an example of how data is analyzed using a non-parametric 
method, consider the graphs in Figure 3.1. The graph in Figure 3.1(a) is the 
underlying distribution of the simulated data set which is represented non- 
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parametrically in the form of a histogram. The three sample sizes of N=25, 
100, and 1000 are typical of the number of data available for materials which 
have little experience (i.e., very new material), moderate experience, and 
extensive experience, respectively. In Figure 3.1(b), the histogram of the 
small sample size of N=25 samples does not produce a meaningful shape 
which could be utilized for reliability characterization. Little improvement 
results when the number of samples is increased to N=100, as shown in 
Figure 3.1(c). The number of samples must be increased to N=1000 and 
beyond, as illustrated in Figure 3.1(d), before any meaningful resemblance 
to the underlying distribution can be made. The observation is that non- 
parametric methods are useful only when large amounts of data are available; 
and N>1000 is very large for engineering data. 

The advantage of using a non-parametric approach is that only data is 
needed. The disadvantage in the approach is that it is often not practical or 
possible in the case of time dependent experiments to obtain the large 
amount of data needed to make this approach produce meaningful results. 
As a result, reliability predictions for new engineering applications must 
almost always be based on a proper model with adequately estimated 
parameters from limited data. 
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Non-Parametric description of Probability Density Function 
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FIGURE 3.1 NON-PARAMETRIC DESCRIPTION OF F(X) 
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B. PARAMETRIC RELIABILITY CHARACTERIZATION 

The prediction of reliability in a parametric sense is based on the 
selection of an appropriate model and estimation of the associated parameters 
from data. The model must adequately represent the physical process of the 
phenomenon being characterized. In the case considered herein, the failure 
of a graphite fiber in tension is known to be governed by the strength of the 
weakest segment or link in the fiber. Since it is the presence of a extreme 
value, such as the weakest link, which causes the failure, the process is 
modeled using a Weibull distribution. The CDF of the Weibull distribution 
is given as 

F(x) = 1 - exp(-(|)“) (31) 

where: a is the shape parameter 

P is the location parameter. 

The shape parameter and location parameter a and p are analogous to the 
reciprocal standard deviation (1/a) and mean (ji) used in the Gaussian 
distribution, respectively. The importance of selecting the correct model 
cannot be overstated. No matter how diligent the effort is in determining the 
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parameters of the model, if the model is not correct, the resulting prediction 
will not be meaningful. 

As stated above, the Weibull distribution is known to be the appropriate 
model for predicting fiber reliability. The problem now becomes one of 
determining the values of the unknown parameters a and |3. Since an 
infinite number of samples cannot be tested, the true parameters will never 
be known; they can only be estimated. An estimated parameter, or 
estimator, will always possess some uncertainty. This introduces the concept 
that the estimators themselves may be described by a distribution. 

The objective in the design of an experiment is to develop a method of 
conducting the experiment which would result in the estimation of the 
estimators & and (3 within a desired range. The circular difficulty in this is 
that the experimental procedure cannot be planned in advance to produce the 
desired range of the estimators because little is known about the parameters 
prior to running the experiment. For example, if the application driving the 
experiment is one which requires a very high level of reliability, such as a 
structure in a nuclear safe system, then the estimated parameters will need 
to be determined with a very small variance from the true values. One may 
jump to the conclusion that this would require extensive testing and very 
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large data sets. However, it will be shown in a later section that certain 
processes have the intrinsic property of small variability in the estimators, so 
that only a small amount of data would suffice in achieving the certain 
bounds on the estimators, and the experiment could be planned accordingly. 
The dilemma is that the existence of this property would not be known 
without having first conducted the experiment to produce the data. One 
solution of this dilemma is to use simulation in the design of the experiment. 
A logical representation of the rationale of incorporating Monte Carlo 
simulation in the experiment design is provided in Figure 3.2. 

The first step of the simulation is to assume the values of underlying 
parameters for the simulated data set. If possible, the selection of these 
parameters should be within the anticipated range of values for the 
underlying parameters of the actual data set so that the simulated process 
will closely approximate the actual process. Although it is true that little 
information may be known about the parameters prior to commencement of 
the experiment, the expected range of parameters can normally be bracketed 
about a limited range of values. For example, considering the life of a 
graphite fiber, it is anticipated that the value of the shape parameter of the 
distribution will be between 0.1 and 1 because of the experience in glass 
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Validate optimal experimental design by Monte Carlo Simulation 
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FIGURE 3.2 OPTIMUM EXPERIMENT DESIGN BY MONTE 
CARLO SIMULATION 



fibers and Kevlar fibers. To explore the effects of the experiments, two 
separate simulations should be run with the underlying values of a assigned 
in the upper and lower range (i.e., 0.1 and 1) and the proposed procedures 
can be evaluated accordingly. 

The random data set is simulated using the known values of the above 
range of underlying parameters. Once the simulated data set is obtained, the 
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experiment design methodology is applied to that simulated set of data. In 

the case of this study, an information based experimental design 

methodology is used to obtain optimum estimation of the parameters. The 

£ A 

estimated parameters based on the simulated data, denoted a and p, are 
computed using the prescribed procedure and are compared with the known 
parameters. If the estimated parameters adequately recover the known values 
of the underlying parameters, the method is verified and can be implemented 
on an actual set of data. The analysis of the actual data using the verified 
information experimental design procedure will assure the optimal parameters 
estimation. 

This section provided an overview of how a random set of data is 
simulated and how it may be implemented in an experiment design. The 
remaining portion of this chapter will be dedicated to the presentation of the 
results of many different simulations conducted and what inferences can be 
made regarding the role of the parameters in reliability. 



C. THE ROLE OF PARAMETERS IN RELIABILITY 

In order to determine the role of the parameters in reliability, it is first 
necessary to distinguish the best graphical domain with sufficient sensitivity 
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FIGURE 33 GRAPH OF THE PDF F(X) FOR TWO DATA SETS 
in reliability to identify the uncertainty in one or both of the parameters a 
and (3. The parametric distributions may be presented either as a probability 
density function (pdf), cumulative distribution function (CDF), or 
transformed CDF. The transformed CDF, denoted F*, is a linearized plot of 
the CDF accomplished by the weakest link transformation of the probability 
F(Xi) using the relationship 

F* = ln(-ln(l -F(x i ))) (3-2) 

and transforming the realized random variable by ln(Xj). These three plots 
are shown in Figures 3.3 through 3.5, each containing a plot of distributions 
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resulting from two simulated data sets of ten each with different values for 
the parameters as indicated. 

Two data sets of ten points each are simulated to illustrate the graphical 
appearance of data in the three respective probability spaces. One large 
scatter data set (open circles) and one small scatter data set (solid circles) are 
simulated from parameters as indicated; the corresponding underlying 
distributions are respectively shown by solid and dashed lines. In the density 
space shown in Figure 3.3, the sample number is too small to form a 
nonparametric empirical pdf, as previously noted in section A of this chapter. 
Additionally, even if enough data was available for a description of the pdf, 
subtle changes of the curve in the region of very high levels of reliability, 
which correspond to low levels of probability of failure (e.g. 10' 4 and below), 
are not distinguishable in the pdf domain. This portion of the distribution 
is called the lower tail, and its shape is the focus of reliability applications. 
The lower tail is sensitive to variations in the parameters, making it 
important that this portion of the curve be graphically representable. 

While the same small data sets form a more meaningful trend in the 
CDF space, the lower tail is also obscured. The data points of the simulated 
data sets are plotted in this domain based on the rank of each sample. The 
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notion of rank refers to the value of the realized random variable of the 



sample relative to the sample tested, whereas the true rank is the value 
relative to all possible samples that exist in the population, tested or 
otherwise. A further discussion of rank is provided in Appendix A. 

The model which best fits this data is provided by the parameters of the 
model. The estimation of the best fit parameters is performed in the density 
domain, which results in the likelihood estimators. The likelihood estimators 
are the most likely parameters based on the observed data. For the Weibull 
model, the likelihood estimators are denoted a and $. Likelihood estimators 
and maximum likelihood estimators (MLE) are explained in more detail in 
Appendix B. 

Once the estimated parameters are determined using likelihood 
estimation, the curve representing the CDF may be plotted. The two 
resulting curves for the two simulated data sets are shown in Figure 3.4. 
Because the slope of the curves in the CDF domain is approaching zero in 
the low range of probability of failure they do not provide the resolution to 
distinguish the lower tail of the curve. 

The graph of F* versus In (X;) for the two data sets is provided in 
Figure 3.5. This graph is useful in reliability because the subtleties in the 
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lower tail will clearly be evident. There is an additional advantage of using 
this domain: the shape parameter a is the slope of the line, and the location 
parameter P is the x value which corresponds to the value of zero on the F* 
axis. Wide scattering of data is characterized by a small value of a, or a 
nearly flat line in the F* plot; a large value of a corresponds to very little 
scatter, which is represented by a steep line in F*. 

As previously mentioned, the area of interest in the graph with respect 
to reliability is the lower left hand portion of the graph which corresponds 
to the region of low probabilities of failure. This region is referred to as the 
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FIGURE 3.5 TRANSFORMED CDF F*(X) FOR TWO SETS OF DATA 
reliability region and is shown in Figure 3.6 as the shaded region in the F* 

graph. The top of this region is bounded on the F* axis by the maximum 
probability of failure permitted by the design specification, and bounded on 
the x* axis by the maximum permitted value of the design variable, also 
defined in the design specification. For example, the design specification of 
a composite pressure vessel may state that the vessel should be designed to 
accommodate a maximum stress of 30 ksi with a reliability of 0.9999. This 
specification would define a reliability region of 0.0001 on the F* axis or 
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FIGURE 3.6 RELIABILITY REGION IN THE F* DOMAIN 

F<(1 -0.9999) and x < 30 ksi on the x-axis. A sample of how the reliability 



region would appear as the shaded region in Figure 3.6. 

The reliability region is used to determine whether the object under test 
will be capable of meeting the design specifications. As an illustration, 
consider the two linearized curves of the simulated data sets in Figure 3.6. 
The curve which characterizes the data set with a small amount of scatter 
(cx=10) passes below the reliability region. This is representative of a very 
safe design, or possibly an overdesign, because the maximum allowed 
probability of failure is not reached until relatively high values of the random 
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variable are reached. Or stated differently, for the maximum permitted value 
of the random variable for the design, such as stress for a structure, the 
probability of failure is much lower than required. The designer may then 
evaluate whether the cost of this extra safety is justifiable. 

The curve representing the simulated data set with a large amount of 
scatter (oc=2) passes above a portion of the reliability region. In this case 
the design does not meet the required reliability and safety specification for 
certain values of the random variable. This result would indicate that either 
the design be reworked to meet the specifications or the specifications be 
relaxed to accommodate less reliability and safety. 

The parameters of the model selected to describe the physical process 
have been shown to be essential in the quantification of reliability. The 
reliability region was identified in evaluating the impact of the values for the 
parameters on the design. The next section will address how the estimated 
parameters of the model vary with the number of data tested and the type of 
data obtained from an experiment. 
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D. STATISTICS OF THE ESTIMATED PARAMETERS 



In order to show how the statistics of the estimated parameters vary, 
multiple simulations were conducted with different numbers of samples in 
a data set and different values of the underlying parameters. For each 
simulation run, the MLE parameters were computed for the generated data 
and tabulated. Since there are two parameters estimated, the relative 
frequency of occurrence of an a, 0 pair is represented as a three-dimensional 
histogram. Each column in the histogram would represent the number of 
occurrences that the MLE parameter fall within a specified band, called the 
class interval. 

The underlying values of the shape parameter a selected for the three 
simulations are 0.8, 5, and 20. The rationale for selecting these values is 
that they are numbers which are typical in describing life data for a Kevlar 
fiber, strength data for a graphite fiber, and strength data for a composite 
strand, respectively. These numbers not only illustrate the statistics of the 
parameters in the area of composite reliability, the interpretations can also 
be extended to data sets in general which are characterized by a large 
amount of data scatter (a=0.8), intermediate amount of scatter (oc=5), and 
small scatter (a=20). The two sample sizes selected for the simulation are 
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N=10 and N=100 samples. The rationale for the selection of these two 
numbers of samples is that they are typical bounds used in the quanification 
of reliability in engineering design. 

The MLE parameters were calculated for each simulated data set of 
N=10 and N=100 samples and stored. The process was then repeated 10,000 
times; resulting in 10,000 a and p. The objective of the simulation was to 
observe any differences which might exist in the distributions of the 
estimators as a function of the values of the shape parameters and sample 
sizes. 

The location parameter (3 was not varied in the simulation because the 
variability can be observed in the normalized form. 

1. Inferences on the Statistics of the Parameters based on 
Simulation 

The results of the simulations are provided in Figures 3.6, 3.7, and 
3.8. Each figure is a histogram representing the relative frequencies of the 
estimators and the corresponding marginal distributions which are obtained 
by projecting the histogram on the &, and $ axes. Stated mathematically for 
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(a) Underlying shape parameter a = 0.8, data sets of N=10 samples 




(b) Underlying shape parameter a = 0.8. data sets of N=100 samples 



FIGURE 3.6 JOINT HISTOGRAM AND MARGINAL 
DISTRIBUTIONS OF THE ESTIMATED PARAMETERS FOR cx=0.8 
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0 1 2 3 4 

(a) Underlying shape parameter a = 5, data sets of N=10 samples 



<i) 




(b) Underlying shape parameter a = S, data sets of N=100 samples 



FIGURE 3.7 JOINT HISTOGRAM AND MARGINAL 
DISTRIBUTIONS OF THE ESTIMATED PARAMETERS FOR a=5 
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(b) Underlying shape parameter a = 20, data sets of N=100 samples 



FIGURE 3.8 JOINT HISTOGRAM AND MARGINAL 
DISTRIBUTIONS OF THE ESTIMATED PARAMETERS FOR a=20 
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discrete random variables, the marginal distribution is defined as 



f,M = E /(*.y») 

n 

f y (y ) = IT /(w) 

n 



(3.7) 



where: f x (x) are the marginal distributions of the x random variable. 
f y (x) are the marginal distributions of the y random variable. 
f(x n ,y) is the joint probability distribution of x„ with y constant. 
f(x,y n ) is the joint probability distribution of y n with x constant. 

It is important to note that in each of the histogram or marginal distribution 

plots, the & and $ axes have been normalized by the known underlying 

values so that the true values will be represented as the number 1 .0 on each 

of the axes. 

The histograms and marginal distributions for the case where 
oc=0.8 and N=10 and N=100 are provided in Figure 3.6. The joint histogram 
in Figure 3.6(a) is characterized by a large scatter of the computed MLE 
parameter for the case of N=10. This is also evident in the marginal 
distributions of a and The scatter was reduced when more samples were 
tested in each data set, as shown in the case when N=100. In Figure 3.6(b) 
the marginal distributions of a and $ indicate that a tenfold increase in the 
amount of data in each sample set significantly reduces the uncertainty of 



42 



recovering the correct parameters. It is important to note that for both cases 
where N=10 and N=100, the distribution of the estimated location parameter 
$ is wider than the shape parameter oc. The overall interpretation is that 
when oc is small (i.e., the data is largely scattered) there exists a large range 
of parameters which would be likely to describe the data. 

The underlying value of the shape parameter oc in Figure 3.7 was 
increased to a=5 in order to represent sets of data that possess an 
intermediate amount of scatter. For the case of very small sample size 
N=10, as shown in Figure 3.7(a), the uncertainty in the MLE oc is 
approximately the same as that shown for a=0.8. The uncertainty in the 
location MLE however, is dramatically reduced when compared to the $ 
curve in Figure 3.6(a). The uncertainty in the both estimators is reduced by 
increasing the sample size to N=100. But as illustrated in Figure 3.7(b), the 
amount of the reduction in the uncertainty of $ obtained by increasing the 
number of samples simulated in each data set is not as great compared to the 
reductions obtained for oc=0.8. The interpretations of these results are that 
since the scatter in the data of the simulated data sets was reduced, the 
location parameter can be estimated fairly accurately, even with a small 
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amount of data. The increase in the samples comprising each data set 
significantly reduced the uncertainty of the estimator a. 

A data set which has little scatter in the data is representable by 
a large value of the shape parameter. The results of the simulation for this 
type of data set are provided in Figure 3.8, where a=20. Again, as shown 
in Figure 3.8(a), the uncertainty in a is approximately the same as that 
obtained for oc=0.8 and a=5. Note that the $ is known very well, and the 
improvement in the uncertainty of this estimator is not detectable when the 
sample data set is increased to N=100 samples, as illustrated in Figure 3.8(b). 

If the goal of the experiment was to estimate $ (the estimator of 
the location parameter (3), then for data which is described by very little 
scatter (large shape parameter), increasing the number of samples provides 
little improvement on the level of uncertainty in 0. A similar trend is 
expected for the estimated shape parameter a. Although this trend was not 
demonstrated for the series of simulations performed here, it is anticipated 
that for very high values of a, say oo50, only a small amount of data 
would be required to estimate the parameter with a high degree of certainty. 
In the limiting case where oc=<=°, the curve of the distribution would be a 
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vertical line in the F* domain, and only one realization of the random 
variable would be required to define the parameters. 

The influence of the number of samples in the data set and the 
associated scatter possessed by that data set significantly affects the 
uncertainty of the estimator. In some experiments, only limited amount of 
data would be required to determine the parameters; in others generous 
amounts of data would be needed. It is clear that some form of optimality 
condition should be devised which would assign a value to the amount of 
uncertainty in the estimated parameters. The experiment could therefore be 
designed so that the uncertainty is minimized under the constraints of limited 
equipment and time. The formulation of this optimality condition is the 
subject of Chapter IV. 
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IV. OPTIMIZATION OF INFORMATION IN AN EXPERIMENT 



The statistics of the estimated parameters are influenced by both the 
amount of data obtained from an experiment and the intrinsic scatter in the 
data. In many cases, large amounts of data are required to estimate the 
parameters within the desired degree of certainty. However, an experiment 
will only produce a limited amount of data. Experiments are commonly 
limited in the time available to conduct them, or by the capacity of the test 
equipment. In time-limited experiments, such as in life and fatigue tests, 
production deadlines often limit the amount of time allowed to perform the 
required testing and resources limit the number of experiments which can be 
run concurrently. Capacity limited data refers to the situation where the 
capacity of the test equipment is not sufficient to cause a realization of the 
random variable. For example, consider the problem of determining the 
ultimate torsional strength of a propulsion shaft for an aircraft carrier. If the 
test equipment available does not have the capacity to achieve the predicted 
failure torque, then the experiment is said to be limited in capacity. 



46 



A. CENSORING OF AN EXPERIMENT 



Censoring the experiment means the experiment is terminated at either 
a scheduled value of the random variable or by the number of realizations 
which have occurred. In a life experiment, scheduled censoring means that 
the experiment is terminated after a preassigned period of time has elapsed, 
regardless of the number of samples that have failed. If the life experiment 
was censored by the number of realizations, then the experiment would be 
terminated after the predetermined number of samples have failed, regardless 
of the time it takes for those samples to fail. A capacity limited experiment 
is essentially a schedule censored experiment because the it will be 
terminated when a specific value of the random variable is reached, no 
matter how many samples have failed. Consider the previous example of the 
aircraft carrier propulsion shaft. If the test equipment in use does not have 
the capacity to achieve the desired value of torque, then the experiment is 
effectively censored (scheduled) at the maximum torque that may be 
achieved by the test equipment. 

The question which arises in censoring an experiment is when should 
the censor be implemented. Is it better, for example, to allow 10 objects on 
test to proceed for the duration of a three year life experiment, or censor the 



47 



experiment at the end of each year and immediately put 10 new samples on 
test, thereby testing a total of 30 samples for one year each? Further 
complicating the issue, one may ask whether it might even be more useful 
to censor every six months, so that the number of repeated tests is six vice 
three or one. The advantage of censoring every six months is that a total of 
60 samples would be put on test as compared 30 samples for the yearly 
censor, or only 10 samples for the case if no censoring was utilized. The 
disadvantage of the six month censor plan is that the six months might not 
be enough time to produce any failures. 

A similar decision is required for capacity limited experiments. The 
censor torque for the aircraft carrier shaft example might be the maximum 
torque of the test equipment. However, if no shafts fail at that torque 
because it is low relative to the mean failure torque, then no data is obtained 
for estimating the parameters of the failure model. What is desired is to 
know the minimum down-sizing of the shaft for testing so that some failure 
will be observed, producing useful data for analysis. The reason the largest 
shaft is desired is to minimize size effect extrapolation from the reduced size 
test sample size back to actual prototype dimensions. 
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1. Effect of Censoring on the Estimated Parameters 

The solution of when to censor either a time or capacity limited 
experiment is secured through information of the likelihood function. The 
reason the likelihood function is chosen for this purpose is that it captures 
the probability that a specific pair of parameters are the ones which provide 
the best fit of the data. Since the goal of performing the experiments is to 
determine the parameters of the failure model, the likelihood function will 
serve to gauge the effect of early censoring on the estimated parameters. 

The likelihood function of Eq B.l must be revised to account for 
the samples which did not fail at the point when the experiment was 
censored. All of the samples which would have remained on test are 
therefore reliable up to the point of censor. The likelihood function becomes 
the product of the probabilities of the realized data times the reliabilities of 
the samples which survived. In equation form, the likelihood function for 
n exact and c censored samples of N total samples becomes 



Figures 4.1 through 4.3 demonstrate the impact of censoring on the 
likelihood surface and the corresponding marginal distributions of the shape 



n 
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parameter a based on the point a simulated experiment is censored. In the 
simulation, three different numbers of realized data were used as the censor 
points. Additionally, the simulation was executed for three different values 
of the underlying values of a. The motivation behind selecting three 
different censor locations and underlying shape parameters was to evaluate 
the effect of the number of realized data, denoted n, and the amount of 
scatter possessed by the experimental data have on the estimated parameters. 
Two of the values selected for a in the simulations were 0.2 and 1, as they 
represent typical high and low values for the shape parameter in fiber life 
experiments. The other value was chosen to be a=5, a commonly observed 
value for describing fiber strength data. Choosing practical numbers such as 
these will provide some insight as to how the actual data obtained from the 
fiber experiments will affect the estimated parameters. 

The likelihood surfaces in Figures 4.1(a)-(c) are the result of 
censoring a simulated experiment of 100 samples which have the intrinsic 
property of producing a large amount of scatter in the realized data (a=0.2). 
Each surface has the appearance of a long tunnel parallel to the (3 axis (the 
location parameter). The interpretation is that when a is small (large 
scatter), a large range of p’s could equally likely be the underlying location 
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parameter. The marginal distributions of a, provided in Figure 4.1(d), are 
obtained by numerically integrating out the P from the likelihood surface. 
There is minor difference in the shapes of the curves represented by 
censoring after 5, 10, and 25 realizations. This is an important result 
because it shows that little improvement in the uncertainty of the estimated 
parameter is obtained by prolonging the experiment so that 25 failures would 
occur instead of censoring it after only five failures. This suggests that early 
censoring is effective for processes described by small values of a. The 
early censor would therefore effectively allow a larger number of samples N 
to be tested, resulting in more early failures n to be used in the analysis. 
The disadvantage of implementing an early censor strategy is that knowledge 
of the location parameter is sacrificed, because data near the mean value is 
never realized because the experiment was terminated early. In life 
experiments, however, for low values of a the mean life is of secondary 
interest. Recall, the most useful quantity is the shape parameter of the lower 
tail in the reliability region. The more data which is realized in the area of 
interest, the better the estimate of the shape parameter. 

The effect of censoring a simulated experiment producing a data 
set described by an underlying a=l is given in Figure 4.2. The resulting 
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(a) Likelihood (data censored at 5 points) 
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(underlying alpha = 0.2) 



(e) Likelihood (data censored at 25 points) 




(b) Likelihood (data censored at 10 points) 
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FIGURE 4.1 EFFECT OF CENSORING ON LIKELIHOOD FOR oc=0.2 
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(a) Likelihood (data censored at 5 points) 

jJ jtev 

mmln 



alpha 



beta 




(underlying alpha=l) 



beta 





(c) Likelihood (data censored at 25 points) 
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FIGURE 4.2 EFFECT OF CENSORING ON LIKELIHOOD FOR a=l 
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(a) Likelihood (data censored at 5 points) 




(underlying alpha=5) 



(b) Likelihood (data censored at 10 points) 
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FIGURE 4.3 EFFECT OF CENSORING ON LIKELIHOOD FOR a=5 
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data possesses much less scatter than the previous data set described by 
oc=0.2. The realizations of the data have less variability, resulting in 
narrowing the range of parameters which would be likely to fit the data. 
This is evident in the likelihood surfaces shown in Figures 4.2(a)-(c), and in 
the marginal distributions of a in Figure 4.2(d). The trend of reducing the 
range of the probable number of parameters as more data is realized is 
visible in the shapes of the likelihood surfaces. The likelihood surface of 
Figure 4.2(a) is that of only five data points. As illustrated by the ’L’ 
shaped surface, the are many a and (3 combinations which would be equally 
likely. When more data points are obtained through delaying the censor 
(Figure 4.2(c)), the shape of the likelihood surface shrinks to a smaller 
number of probable parameters. The corresponding marginal distributions 
of each of these surfaces is provided in Figure 4.2(d). The significant 
reduction in the uncertainty in a is obtained by extending the experiment to 
achieve more realizations at higher values of the random variable. The 
observation in this case is that, for intermediate variability (a=l), censoring 
such a life experiment should be delayed. 

The likelihood surfaces and marginal distributions of a for the 
data simulated for an underlying a=5 are shown in Figure 4.3(a)-(d). The 
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reduction of the probable pairs of parameters resulting from increasing the 
number of realized data prior to censoring is illustrated by the sharpening of 
the likelihood surfaces in Figures 4.3(a)-(c). The resulting marginal 
distributions of a show that a dramatic reduction of uncertainty in the 
parameters is obtained by letting the experiment continue in time vice censor 
early. The reason for this is that for a=5, the failures will occur in a tighter 
range about the mean value, so that early failures really do not provide much 
detail about the characteristics of the data. 

Figures 4.1 through 4.3 have shown that the number of realized 
data and the underlying shape parameter of the data have a significant impact 
on the nature of the uncertainty of the parameters, and therefore play a 
crucial role in determining a censor strategy. For large numbers of realized 
data, the likelihood surfaces will shrink to a spike about the underlying 
parameters, regardless of the scatter in the data, making the problem trivial. 
For the more realistic cases where there is only a small number of realized 
data, the knowledge gain for the estimated a is larger than for the estimated 
P when okI. If ool, however, the reverse is true. An explanation of this 
trend is achieved by referring back to the appearance of the distributions in 
the F* domain. In the F* graph, the parameter a is the slope of the line 
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representing the distribution, and the value of p is the location of intersection 
between the distribution and the horizontal line of F* =0 (or F=.632). If a<l , 
then the slope of the distribution in the F* domain is less than 45°. For 
values of a«l, the distribution is nearly flat, and any variation of values 
along the F* axis, caused by errors in the empirical rank of the data, would 
affect the value of P the most because it is the point of intersection of two 
nearly horizontal lines. For a>l, the slope of the distribution in the F* 
domain is greater than 45°. If a»l, then the distribution is a steep line, 
causing the rank error to effect the slope a more than p. 

The design variable in planning an optimum censor strategy for an 
experiment is the censor locations, which are bounded by earliest censor of 
only one realization, and the latest censor would be after the last realization 
of the total N on test (the trivial case). If the available time of the 
experiment is insufficient to produce the required number of realizations of 
the random variable, or the test equipment is limited in capacity, then the 
total number of samples put on test must be increased by implementing 
censoring. The determination of the optimum censor location is made 
through the application of Information Theory. 
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B. USING CENSORING TO OPTIMIZE INFORMATION 



A univariate objective function for maximizing the knowledge of the 
parameters is obtained through the application of information theory. As 
stated in Chapter 2, the information I is a scalar measure of how well the 
parameters are known. For convenience, Eq. 2.5 is restated as follows 

7 = | p(6) log p(0) dd ( 4 - 2 ) 

where P { 0 1 D } is the probability of the parameter, given a set of data. 

For large sample sizes, say N > 1000, the probability density of the 
parameters is asymptotically normal by the Central Limit Theorem 
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and the information is analytically computable by 
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The typical sample sizes in engineering are N < 100, which does not 
produce a normal distribution of the parameters. As demonstrated by 
simulation, the marginal distributions of the shape parameters in Figures 4.1 
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through 4.3, the distributions are clearly not normal. The result is that an 
analytical form of the information /, such as that given in Eq 4.4, cannot be 
utilized. For small data sets, the probability density function of the 
parameter is the normalized likelihood function. 

The likelihood function is useful because it contains the internal 
parameters of n, N, a, and P and has the advantage that it is not distribution 
specific. In developing an optimum censor strategy, the objective is to find 
the censor location x c so that multiple repeated tests could be performed to 
maximize the certainty of the parameters, within the constraints of time and 
number of test stations. A high level of certainty in the parameters is 
characterized by a spike in the likelihood surface. On the contrary, a low 
level of certainty in the parameters would be represented by a more diffuse 
likelihood surface. The advantage of computing the information using the 
likelihood function is that it is a method of numerically distilling the many 
shapes of the likelihood surface into a single parameter. 

The many shapes of the likelihood surface may be determined 
analytically by computing the 0 lh moment, 1 st moment, 2 nd moment, and 
Kurtosis (mathematics of higher order moments). The drawback of using 
these methods is that the objective function would become multivariate due 
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to the number of moments which would be required to characterize the shape 
and the confidence on the estimator of the various moments cannot be 
defined. 

The optimum censor foundation is established by simulation as 
previously illustrated in Figure 3.2. The simulation is conducted as if the 
model and parameters are known. The simulated data set {xj is computed 
based on the expected rank of the i lh sample. The expected rank is defined 
as 

F = — ,i = 1,2, ..., N (4.5) 

N + 1 

The primary difference in this simulation for the calculation of information 
and previous simulations is that the data set {X} was not computed based on 
a random ranking as occurrence in an experiment. Instead, the expected 
values of the data were computed from the expected rank. The reason for 
using the expected values is that the interest in this study is to determine the 
various trends of information for data sets possessing different amounts of 
scatter and the impact of censoring on the information. The use of random 
ranking would result in a large number of simulations required to be run 
before the desired trend would be clearly distinguishable. 
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Information versus Scheduled Censor Location for various a's 




FIGURE 4.4 EFFECT OF a ON INFORMATION IN A SCHEDULE 
CENSORED EXPERIMENT 



Once the simulated data set is obtained, the number of realizations, n, 
is determined based on the number of x/s which have values less than the 
censor value x c . The information I is then computed based on the n realized 
data and N-n censored data. The repetitive computation of I for various 
censor strategies and different sets of data makes it possible to determine the 
best x c for increasing /, and to evaluate when a point of diminishing return 
is reached when trying to improve 7. 

The simulations conducted in this study are based on a life test 
experiment on sets of data described by three shape parameters: a = 0.2, 1, 



61 



and 5 for the reasons discussed in Chapter III. The impact of both scheduled 
censoring and censoring by the number of realizations was investigated. The 
algorithms and software used in this simulation are provided in Appendix D. 

The graph shown in Figure 4.4 is a comparison of the effects of the 
underlying value of a for the data on the information / for scheduled 
censoring. The time of censor axis is normalized by the underlying location 
parameter (3 so that the time of the censor relative to the mean life is evident. 
The curve representing a=0.2, or data with a large amount of scatter, appears 
nearly flat. The only dramatic changes which occur for this curve are in the 
region of very early censor times. Therefore, the small amount of increase 
in the information of the parameters which results from allowing the 
experiment to continue without censoring is not justifiable. The increase in 
/ for a=l occurs rather quickly up to the mean life and then diminishes. 
This is due to the increased grouping of the data and the observation that 
fewer failures occur early, resulting in little information being gained for 
early censor times. The change in information is even more dramatic for the 
case where a=5. The data is more grouped about the mean, so that the 
information gain occurs only in this area. 



62 




N=300 
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N=100 



N=50 



• N=10 



FIGURE 4.5 EFFECT OF N ON INFORMATION IN A SCHEDULE 
CENSORED EXPERIMENT FOR cx=0.2 



Figure 4.5 is a graph of information versus the scheduled censor times 
for various sizes of data sets for a=0.2. The purpose of this graph is to 
evaluate the impact of increasing N on the information. As shown in the 
graph, the increase in / is dramatic between N=10 and N=100 samples. 
Note, however, that the increase in / for an increase from N=100 to N=300 
samples is significantly less. The important implication of this graph is that 
there exists a point of diminishing return in trying to increase I by obtaining 
more data, and that the amount of increase is quantifiable. Similar results 
were obtained for a=l and a=5, as shown in Figures 4.6 and 4.7. 
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Information versus Scheduled Censor Location ( a = 1 ) 




FIGURE 4.6 EFFECT OF N ON INFORMATION IN A SCHEDULE 
CENSORED EXPERIMENT FOR a=l 



Information versus Scheduled Censor Location ( a = 5) 





FIGURE 4.7 EFFECT OF N ON INFORMATION IN A SCHEDULE 
CENSORED EXPERIMENT FOR a=5 
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The effect of censoring a experiment based on the number of realized 
data on the information is shown in Figure 4.8 for various underlying shape 
parameters. The abscissa is labelled the number of fractional realization 
because the number of realizations has been normalized by the number of 
samples put on test. The value of 0.1 for the fractional number of 
realizations means that 10% of the samples tested have failed. The plot of 
the curve for a=0.2 appears initially flat and then begins to increase at a 
fairly constant rate. This is due to the long, tunnel like shape of the 
likelihood surface for small number of data points when a < 1, as was 
shown in Figure 4.2. The implication of this shape was previously 
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FIGURE 4.8 EFFECT OF a ON INFORMATION FOR AN 
EXPERIMENT CENSORED BY THE NUMBER OF REALIZATIONS 



Information versus Number of Realizations for various a's 
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FIGURE 4.9 EFFECT OF N ON INFORMATION FOR AN 
EXPERIMENT CENSORED BY THE NUMBER OF REALIZATIONS 
FOR a =0.2 



disscussed and is caused by the large number of location parameters which 
would be likely to fit the data. Due to constraints imposed on the numerical 
algorithm to integrate the volume of the likelihood surface, the upper limit 
for integration in P was approximately 20 times the underlying value. The 
curve is flat in the region of fractional realizations less than 0.2 because the 
slight changes which occur in the shape of the surface take place past the 
maximum P integration limit, and are therefore not detectable. The 
justification for establishing this upper limit in p is that changes which occur 
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in the likelihood surface past the point of 20 times the mean life of an object 
are of little practical use. 

The curves representing oc=l and oc=5 in Figure 4.8 begin at lower 
values of / and increase faster than for a=0.2. The reason the information 
is initially higher for small values of a is that the curves representing the 
probability density f(x) for the Weibull model are skewed toward the region 
of early failures. As the value of a gets larger, the density becomes 
narrower (less scatter) and less skewed toward the early failures. Therefore, 
early failures which occur when a is large do not provide much information 
about the underlying shape parameter. 

Figures 4.9 through 4. 1 1 are graphs which show the influence of the 
total N samples put on test on the information for the three a’s of concern. 
In each figure, the concept of a point of diminishing return in improving I 
by more data is validated. 

1. Optimization of Information in a Time Limited Experiment 

An illustration of how recursive censoring can be used to improve 
the information of the parameters is provided in Figure 4.12. This figure is 
a plot of I versus the number of fractional realizations for oc=0.2 for N=10, 
50, and 100 total samples tested. Note that on the top of the graph an 
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Information versus Number of Realizations (a = 1) 
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FIGURE 4.10 EFFECT OF N ON INFORMATION FOR AN 
EXPERIMENT CENSORED BY THE NUMBER OF REALIZATIONS 
FOR a=l 



Information versus Number of Realizations (a = 5) 




FIGURE 4.11 EFFECT OF N ON INFORMATION FOR AN 
EXPERIMENT CENSORED BY THE NUMBER OF REALIZATIONS 
FOR a=5 



additional axis of scheduled censor times is plotted which is used to indicate 
the fractional number of samples realized within the specified time prior to 
censoring. The scheduled censor time of 0.1 of the mean life corresponds 
to the point of approximately 46% of the samples realized on the fractional 
number of realizations axis. An example of the impact of an early censor 
on the information of the parameters is shown on the figure. In this 
example, suppose that two sets of life tests of 50 samples were initiated at 
the same point in time. One test was allowed to progress through the 
duration of time up to four times the mean life, without censoring. The 
other test are censored at 10% of the mean life, and immediately 50 more 
samples were put on test. The point of the scheduled censor at 0.1 
establishes the baseline to be used in the comparison of the information 
between the two tests. 

The quantity labeled I x is the increase in information which results 
from the test proceeding to 0.2 of the mean life, without censoring. As 
shown on the graph, the amount of increase is extremely small. For the test 
which was censored, however, a total of 100 samples were put on test for a 
period of time of 0.1 of the mean life. The increase in information is 
labelled as / 2 and represents a drastic increase in the information. Even more 
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Improvement of Information through Recursive Censoring 

Large Increase in Information from recursive censor for a = 0.2 
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FIGURE 4.12 INCREASING INFORMATION THROUGH EARLY 
CENSORING IN AN EXPERIMENT 
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important is that the I 2 is a larger increase in information than would have 
occurred if the test without censoring was allowed to continue up to four 
times the mean life. The significance of this is that more information about 
the parameters was obtained in a fraction of the time it would take if no 
censoring was utilized. 

2. Optimization of Information in a Capacity Limited Experiment 

Information theory can also be applied to an experiment which is 
limited by the capacity of the test equipment. Consider an example in which 
the goal of a particular experiment is to predict the reliability of a large flat 
plate, such as the decking on a ship. The problem exists that the test 
equipment available only produces 40% of the expected mean failure load 
of the plate. Resolution is required for the question of whether the 
appropriate parameters can be obtained by testing the full size plate. The 
issue is resolved through the application of information theory. The goal is 
to use information theory to determine if an experiment will produce better 
certainty of the parameters by testing smaller scaled down sizes of the plate 
or simply more tests of the big plate. 

The failure process of the plate is modeled by dividing it up into 
many smaller, equal-width strips of plate in parallel. If the failure process 
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(4.6) 



R(J>,) = exp(-(|i)“) 

H 

is characterized by fracture mechanics, the entire plate will fail when a crack 

is initiated at the weakest strip and propagates through the plate 

catastrophically. The Weibull model is selected for this problem because it 

is the presence of a extreme value which results in the failure. Reliability 

for an individual strip of the plate is given as 

where: p, is the random variable of load for the i th strip 
Pi is the mean load of failure for each strip 
a is the shape parameter of the failure model 
R(p, ) is the reliability for the value of the random variable. 

Since the process is weakest link (or one which is serial in the failure 

mechanism) the reliability of the plate is the product of the reliabilities of the 

individual strips in the plate as in 

R = Rip,) R(p 2 ) ... R(p k ) (4.7) 

Three different sizes of the plate are proposed to be tested utilizing the 
maximum load of the test machine. The largest size tested is chosen to be 
the full size plate, because no size effect considerations are required. The 
two smaller sizes, one representing a moderate reduction in the size of the 
plate (medium plate), and the other representing a large reduction in size 
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(small plate), are determined from size effect considerations and the desired 
number of failures in the experiment. 

The three proposed plates to be tested are shown in Figure 4.13. The 
values of k L , k M , and k s represent the number of parallel strips in the plates; 
a unitless metric of the plate width since all strips are assumed to have both 

(a) Small Plate Model - ks strips (b) Medium Plate Model - strips 




FIGURE 4.13 FAILURE MODELS FOR A PLATE IN TENSION 
equal width and length. The reliability of the entire large plate is obtained 
from Eq 4.7 by taking into account equal loading for each strip and the 
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presence of k L strips 



R, = exp - 




( 4 . 8 ) 



Similar expressions are obtained for the reliability of the medium and small 
plate. The intention is to select the proper sizes of the medium and small 
plate such that the reliability of these smaller plates is equal to the reliability 
of the large plate. This makes it possible to predict the reliability of the 
large plate by testing one which is smaller. In equation form, this is 
represented by equating the respective reliabilities: R L = R M = R s . These 

equations are then simplified in terms of the applied load ft and the number 
of strips in the test sample k. The resulting equations are written as 




( 4 . 9 ) 




( 4 . 10 ) 
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Changes in the sizes of the plate do not affect the intrinsic scatter in the 
failure data because the failure is governed by the weakest link. Therefore, 
the values of shape parameters of the three plates are assumed equal: a L = 
a M =oc s . The difference in the sizes affects the mean value of the strength 
since there is a greater likelihood of possessing a very weak strip when the 
number of strips in the plate is large. Fortunately, the increase in the 
number of strips in the structure reduces the individual loads on each strip 
because the load is more distributed. 

Two different load levels, p M /(3j and p s /p,, were established for the 
medium and small plates which would produce failures in 60% and 90% of 
the samples tested, respectively. These percentages were selected in order 
to determine the influence of the numbers of failures for the smaller plates 
on the information of the parameter a. The loads which correspond to these 
probabilities of failures F(p/P,) are determined from the CDF of the failure 
model which is computed using the expected values of the parameters. In 
many applications in engineering, some knowledge is possessed about the 
expected shape and location parameters of the failure model. It is assumed 
in this example that the underlying a = 5, P = 1. A plot of the resulting 
CDF is provided in Figure 4.14. Based on this curve, the capacity limited 



75 



load placed on each strip in the large plate is p L /(3 = 0.4, which corresponds 
to failure of 2% of the samples tested. The size of medium plate is selected 
to produce 60% of the plates tested to fail, or p M /p = 0.97 from Figure 4.14. 
The small plate is to be sized such that 90% of the plates tested will fail, 
resulting in p s /(3 = 1 .2. The purpose of choosing those particular loads is to 
determine which plate should be tested so that the most information about 
the parameters is obtained from the experiment. 

Determination of the sizes of the medium and small plates relative to 
the large plate is obtained by substituting the corresponding loads discussed 
above in Eqs 4.9 and 4.10 for a=5. The result is that k L is 83.86 times 
larger than k M , and 243 times larger than k s . 

The graph of the Information / versus the number of fractional 
realizations for a process with an underlying value a=5 is provided in Figure 
4.15. In the case of the large plate, the probability of failure is so low at the 
load at maximum capacity that the number of plates tested was chosen to be 
large, or N=100. The information resulting from testing 100 large plates is 
labelled by point ’A’ in Figure 4.15. If the medium plates are tested a 
higher percentage will fail, so only N=50 are tested. This increases the 
information dramatically, as seen by the point labeled ’B.’ If 10 of the 
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FIGURE 4.14 UNDERLYING CDF FOR THE PLATE FAILURE 
EXAMPLE 

small plates are tested, almost all samples will fail, producing the 
information labeled ’C’ in Figure 4.15. If 50 of the small plates are tested, 
the information is increased further to point ’D.’ 

The purpose of performing this capacity limited experiment is to 
determine the parameters of the failure model of the large plate so that the 
reliability at various stresses may be quantified. Testing the large plate, 
however, only produces a few failures due to the limited load. The result is 
a very small amount of information / on a. The size of the plate can be 
reduced to a size which is sufficiently small such that all samples which are 
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Minimizing the Loss of Information in a Capacity 
Limited Experiment 




I) : Information gain if Full scale shaft tested 

12 : Information gain if Half scale shaft tested 

13 : Information gain if Tenth scale shaft tested 



FIGURE 4.15 MINIMIZING THE LOSS OF INFORMATION IN A 
CAPACITY LIMITED EXPERIMENT 



tested will fail. The small size of the samples makes it possible to test a 
large number of plates to failure, but the gain of information on a diminishes 
once a sufficient amount of data has been collected. The number of samples 
tested for the small size can be optimized to maximize the information on 
a. 

The uncertainty, however, is reintroduced from the size effect 
when the information on a is transformed from the small sample back to the 
actual size. This is due to a large k L /k s ratio being raised to the exponent of 
1/a. Any uncertainty in the a will produce a substantial error in the 
predicted location parameter of the large object. What is gained, therefore, 
in the knowledge of a by testing more small samples may be lost in the 
transition to the actual sample because of the large size effect. 

Increasing the size of the sample tested will reduce the impact of 
uncertainty of a, but the information of a may be lower than that obtained 
from testing many of the very small samples. The objective then is to 
minimize the loss of information by increasing the size of the sample in 
order to reduce the impact of uncertainty in a on a large scaling effect ratio. 
In this regard, point ’B’ in Figure 4.15 would be the desired target point for 
the design of the experiment. 
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V. CONCLUSIONS 



The characterization of reliability requires a failure model based on the 
physics of the failure process, and parameters based on experimental data. 
The limitations of experiments in terms of time or capacity give rise to the 
need for an experiment designed to maximize the knowledge of the 
parameters within the constraints of the limitations. The optimality 
parameter used was the information / of the parameters which distills the 
multitude of shapes of the likelihood surfaces into a single parameter. 

This study has demonstrated that for high variability data, such as life 
or fatigue data, the information obtained from experiments can be 
dramatically increased by recursive censoring. In cases of low variability 
data, such as data resulting from experiments limited in capacity, information 
can be used to determine the maximum structural dimension which would 
result in meaningful estimation of the parameters. 

Due to the large number of possible shapes in the likelihood surface, an 
integration scheme which accommodates an adaptive mesh requires 
development. This would provide the ability to compute information for a 
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large range of parameters without first having to define the bounds of the 
likelihood surface for every data set. 

The application of information theory developed in this study should be 
extended to cover a range of parameters describing the randomness of data 
and censoring strategies in order to develop a set of optimality curves which 
could be used in interactive planning of an experiment design. 
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APPENDIX A: RANK ANI) ORDER IN A DATA SET 



The order of a data set is the arrangement of the data from the smallest 
magnitude of the realized random variable to the largest. An ordered data 
set satisfies the following inequalities [Bury, 9] 

£ x 2 £ • • • £ x n (A.l) 

The rank of a particular sample value x i with respect to the other values 
within the data set is determined by its relative magnitude. The smallest 
value, or the first value in the ordered data set, would have the lowest 
ranking, and conversely, the largest value would have the highest ranking. 
There are numerous ranking methods available which provide discrete values 
of the CDF based on the ranking of each sample in the data set. One of the 
more widely accepted methods of determining F is that of expected rank 

F(x t ) = — i = 1, 2 N (A.2) 

1 N* 1 

where F(x i ) is the probability that a value at least as large as x ; will occur in 
a data set {X} of N samples. Since the value F(x i ) can be determined 
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without the selection of a model, the expected rank of an ordered data set 
represents a viable nonparametric method in computing reliability. 

For example, a fiber having a rank of 0.25 is at least as strong as 25% 
of the fibers tested, which is the same as saying that the probability failure 
F=0.25. However, it may subsequently be determined from additional testing 
that the particular fiber is only as strong as 20% of the fibers tested, or 
F=0.2. Since all fibers cannot be tested, the true rank of the fiber will never 
be known, but the rank serves as a useful way of estimating the value of F 
so that the data may be plotted. 
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APPENDIX B: LIKELIHOOD AND MAXIMUM LIKELIHOOD 

ESTIMATORS 



The likelihood function is defined [Bury, 9] as 



ux-fi) = n M; 0 ) (B1) 

i=l 

where: L(X,0) is the likelihood of the parameter(s) 0 given the data set 
X 

f (Xj,0) is the pd of the i ,h realization of x for the 
parameter(s) 0 

n is the number of samples in the data set. 

The symbol 0 used in the equation is a vector which represents both the 
shape and location parameters a and (3 used in the Weibull distribution. 
Recall that the pdf is the derivative of the CDF with respect to the random 
variable, which is given as follows for the Weibull distribution 

/(*;a, P) = ^ exp[-(^n (B-2) 



The likelihood function is a mathematical representation of the 
probability that a selected pair of parameters describe a given data set. A 
large value of likelihood results when the selected parameters are more likely 
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to be the underlying parameters. This is due to the product of the probability 
densities. If the selected parameters are far from the underlying values, the 
resulting probability densities will be very small numbers, and the product 
of these numbers will be an even smaller number. As the selected 
parameters are in the neighborhood the underlying values, the probability 
densities will become larger, and correspondingly, the likelihood will become 
larger. The point at which the likelihood is the maximum represents the 
most likely parameters which describe the given data set, and are referred to 
as the maximum likelihood estimators (MLE). The MLE parameters can be 
determined by calculus by taking the derivative of likelihood function of 
equation (B.2) with respect to the parameters 0. The resulting MLE 
parameter for the Weibull distribution are given as 
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Alternatively, the values of likelihood for a model consisting of 
two parameters, such as in the Weibull model, can be represented as a three- 
dimensional surface plot or two-dimensional contour plot. The likelihood 
surfaces and contour plots for the data sets represented by curves in Figure 
3.3 are given in Figure B.l. For the cases where the likelihood surface is 
unimodal, the MLE parameters (determined by Eqs. (B.3) and (B.4)) are in 
accord with the peaks on the likelihood contours. 

The estimators determined either by the likelihood contour or by 
MLE are based on the given data set. Due to the randomness associated 
with each set of data, different estimators are likely to be obtained from 
different sample sets, even from the same population. 
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FOR TWO DATA SETS 
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APPENDIX C: LOWER TAIL SUBTLETIES IN DATA 



As an illustration of the many subtleties in the lower tails of the 
distributions encountered in random data sets, four representative F* curves 
are provided in Figure C.l. These plots were obtained from randomly 
generated sets of data of 100 samples. The underlying distributions which 
are straight lines are plotted as a guide for comparison. Notice that in some 
cases the points fall above the line, indicating data which fail earlier than the 
model, while in other cases the points fall below the lower portion of the 
curve, indicating that the samples later than the model. There will, of 
course, be cases in which the data closely resembles the underlying 
distribution. The utility of displaying the distributions in the F* domain is 
clearly evident in these four graphs. 
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APPENDIX I): SIMULATION SOFTWARE 



I. SOFTWARE DESCRIPTION 

The purpose of this appendix is to provide a description of the 
simulation procedures and the software. The rationale behind each 
simulation procedure implemented will be addressed in this appendix, 
followed by the software used to implement that particular component. The 
software programming used in the simulation was developed using the 
MATLAB program by The MathWorks, Inc. 

A. SIMULATION USING A NONPARAMETRIC METHOD TO 
CHARACTERIZE RELIABILITY 

This simulation was an investigation into whether practical levels of 
reliability (i.e., 0.9999 or 0.99999) could be achieved without the need of a 
model. The advantage of using a nonparametric method is that reliability 
can be quantified based on the data directly, without the need of fitting a 
particular distribution to the data. The issue to be determined from the 
simulation is whether or not using a nonparametric method is pragmatic in 
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the amount of data required and the associated cost and time to obtain the 
data. 

The quantity which is random in any experiment is the probability of 
occurrence of a random variable. In terms of composite reliability, it is the 
probability of failure in the random variables of strength and life which is 
random. This random probability of failure can be simulated by setting it 
equal to a random number, defined as having a value between zero and one, 
generated by a computer program. 

The data which correspond to those probabilities of failure can be 
computed using an empirical rank of the data. The concepts of order and 
rank of data are discussed in Appendix A. 

1. Simulation Description 

A set of numbers representing the probability of failure F(Xj) of N 
samples is generated as a set of random numbers {F}. The data set {X} 
cooresponding to those values of { F } are computed using the Weibull 
distribution with known shape parameter a and location parameter P 



91 



= P (-lnd-F ^.))) 0 



(D.l) 



The data set {X} represents a set of realized random variables such as 
strength or life, which have underlying values of the parameters a and P 
that describe the data set. In other words, {X} simulates an actual set of 
data that would have been obtained by testing a structural sample, fiber or 
composite, to failure. The difference in the simulated data set {X} and a 
data set resulting from an actual test is that the underlying parameters are 
known for {X}. 

The data set {X} is then ordered according to Eq (A.l) and the values 
of F(x t ) are computed using expected rank of Eq (A.2). In addition, the true 
rank of each sample, which is the random number used to compute x i is used 
for the purpose as a point for comparison with the expected rank. It must 
be reiterated that the true rank of any data point will almost never be 
absolutely known. 

1. Simulation Software 

% MATLAB program to compare nonparametric method of describing a set of 
% data using expected rank with the known or underlying distribution of 
% the data set 
% 

% Developed by: LT James W. Coleman, USN 

% 
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% Prompt user for input of the number of samples in the data set, the 
% number of simulated data sets desired, and the underlying parameters 
% of the Weibull distribution used to form the data set. 

% 

n=input(’ Enter number of samples in the data set ’); 
iter=input(’Enter number of simulations desired ’); 
alpha l=input(’Enter underlying value of shape parameter alpha ’); 
betal=input('Enter underlying value of location parameter beta ’); 

% 

% Compute the expected rank for each of the samples, and transform to 
% the ln-ln space F* to linearize CDF. Note that expected rank depends 
% only on the number of samples in the data set. 

% 

erank=[l:n]/(n+l); 
fstar=log(-log( 1 -erank)); 

% 

% Perform the simulation as many times as specified. 

% 

for i=l:iter 

% 

% Generate a set of random numbers and assign them as the probability of 
% failure F(xi). Order the probabilities F(xi) and transform into the 
% F* space 
% 

fxi=rand(l:n); 
fxi=sort(fxi); 
fstarx=log(-log( 1 -fxi)); 

% 

% Solve for the values of xi (the set of data) which correspond 
% to those F(xi)’s, given the shape parameter alpha and location 
% parameter beta. The values of alpha and beta used are the underlying 
% parameters and the Weibull distribution used is the underlying distribution. 

% 

xi=betal*(-log(l-fxi)). A (l/alphal); 

% 

% Compute the expected rank for each of the samples, and transform to 
% the ln-ln space F* to linearize CDF 

% 

erank=[l:n]/(n+l); 
fstar=log(-log( 1 -erank)); 

% 

% Plot the results of the expected rank F and the true rank values F(xi) 

% versus the log(xi) in the transformed F* space 
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semilogx(xi,fstarx,’-w\xi,fstar,’ow’) 

pause 

7c 

7c If the simulated data set displays interesting characteristics, the user 
7c may save it for later analysis. File name will have the form xi#.dat. 
qsave=input(’Enter 1 to save this data set xi, 0 to continue’); 
if qsave==l 

eval([’save xi\num2str(iter),’.dat xi /ascii’]) 
acknw=input(’ Record parameters used for xi and hit Enter’); 
end 
end 



B. SIMULATION TO DETERMINE THE DISTRIBUTIONS OF 
ESTIMATORS 
1. Simulation Description 

In this simulation, a random set of data is generated using the 
procedure described above, and the maximum likelihood estimators are 
computed from the random data set. The algorithm used to determine the 
MLE’s incorporates the Golden Section Method to find the zero of a 
function of one variable. The equation of a is given as 
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The golden section method is used to find the zero of Eq D.2; the result 
being the MLE a. Although not considered an efficient means of 
determining the zero of a function, the golden section method has the 
advantage of guaranteed convergence within a computable band of certainty. 
The MLE $ is determined from the value of a by the following equation 

(D.3) 

Once the MLE’s are computed, they are stored in column vectors. The 
process of generating the random data set is repeated 10,000 times in order 
to produce a large number of estimators to be analyzed. The estimators are 
then sorted using a separate routine so that the results can be viewed in the 
form of a joint histogram. The marginal distributions are obtained from the 
joint histogram by applying Eq 3.7. The amount of uncertainty in the 
estimators can then be determined from the resulting distribution. 

2. Simulation Software 

% MATLAB program which computes the Maximum Likelihood Estimators (MLE) 

% of a simulated random data set xi of n samples. The process is repeated 
% 10000 times in order to obtain a distribution of the MLE’s corresponding 
% sets of data of n samples. Each simulated data set has known values of the 
% parameters in the Weibull distribution used to construct the data from 
% random numbers. 

% 

% Developed by: LT James W. Coleman, USN 



P = 



1 " 
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% 

% Description of the variables: 

% max - Maximum number of repetitions, or # of MLE’s computed 
% count - count number of repetitions 
% n - # of samples in the simulated data sets 

% stora,storb - column vectors which MLE’s alpha’ and beta’ are stored 
% alpha 1, beta 1 - underlying values of the parameters 
% fxi - Probability of failure CDF 

% xi - Simulated random data set with an underlying distribution 
% xl,xu - Upper and lower bounds used when computing MLE alpha’ 

% fl.fu - Residual from MLE equation for alpha to be minimized 
% niter - Number of iterations needed to bracket the MLE alpha’ 

% xl,x2 - Intermediate values used to compute MLE alpha’ 

% fl,f2 - Residuals based on the intermediate values 
% t - Golden Section Ratio 

% k - iterations performed within the golden section method 
% fmin - Minimum residual computed in the golden section method 
% alphah - MLE alpha’ 

% betah - MLE beta’ 

% 

% Initialize variables 

% 

max= 10000; 

count=l; 

n=10; 

stora=zeros(max,l); 
storb=zeros(max, 1 ); 

% 

°Jc Prompt user for input of number of samples and underlying parameters. 

% 

n=input(’ Enter number of samples in the data set ’); 
betal=input(’Enter underlying location parameter beta ’); 
alpha l=input('Enter underlying shape parameter alpha ’); 

% 

% Begin repitition of the simulation 

% 

while count<=max 

% 

% Generate a random data set xi from random numbers assigned as the 
% CDF values F(xi) 

% 

fxi=rand(l:n); 

xi=betal *(-log( 1 -fxi)). A ( 1/alpha 1 ); 
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% 

% Solve for the MLE’s using the Golden Section Method to minimize a 
% function of one variable. The function solva(alpha,xi) computes a residual 
% value f resulting from substituting the guessed value of alpha and the 
% data set xi into the equation of ML for alpha. The residual, therefore, is 
% the difference between the computed value and the most likely alpha. This 
% is the value being minimized. 

% Compute residual for the upper and lower bounds xl and xu. These values 
% are set at a factor of 10 above and below the underlying values. 

% 

xl=alphal/10; 

xu=alphal*10; 

fl=solva(xl,xi); 

fu=solva(xu,xi); 

% Seventeen iterations are required for interval of uncertainty of 0.001 19 
niter=17; 

% Determine the lower intermediate value xl of alpha based on a weighting by 
% the golden section ratio t applied to the lower bound. Compute residual 
% with this value. 
t=0.381966; 
xl=(l-t)*xl+t*xu: 
fl=solva(xl,xi); 

% Determine the upper intermediate value of alpha based on a weighting by 
% the golden section ratio t applied to the upper bound. Compute residual 
% for this value. 
x2=t*xl+(l-t)*xu; 
f2=solva(x2,xi); 
k=4; 

while kcniter 
if fl>f2 

% The residual from xl is larger than residual from x2; make the old xl 
% the new lower bound, xl (lower intermediate value) becomes the old x2, 

% the new value of x2 is computed using the golden section ratio on the 
% new bounds. Compute residual for new x2 
xl=xl; 
fl=fl; 
xl=x2; 
fl=f2; 

x2=t*xl+(l-t)*xu; 

f2=solva(x2,xi); 

k=k+l; 

else 

% The residual from x2 is larger than residual from xl; make the old x2 
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% the new upper bound, x2 (upper intermediate value) becomes the old xl, 

% the new value of xl is computed using the golden section ratio on the 
% new bounds. Compute residual for new xl 
xu=x2; 
fu=f2; 
x2=xl; 

£2=fl; 

xl=(l-t)*xl+t*xu; 
fl=solva(xl,xi); 
k=k+ 1 ; 
end 
end 

% Determine which of the residuals is the lowest and the value of alpha which 
% corresponds to that residual. 
fmin=min([fl,fl,f2,fu]); 
if fl==fmin, alphah=xl;, end: 
if fl==fmin, alphah=xl;, end; 
if f2==fmin, alphah=x2;, end; 
if fu==fmin, alphah=xu;, end; 

% 

% Compute the MLE beta from using the MLE alpha and store both estimators in 
% column vectors stora and storb 

% 

betah=(sum(xi. A alphah)/n) A (l/alphah); 
stora(count, 1 )=alphah; 
storb(count, 1 )=betah; 

% 

count=count+l; 

end 

% Save the results for later analysis 
save a_510.dat stora /ascii 
save b_510.dat storb /ascii 



C. SIMULATION TO COMPUTE INFORMATION IN AN 
EXPERIMENT 
1. Simulation Description 
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The simulation developed here will compute the information 
resulting in either a schedule censored or number of data censored 
experiment. The data set generated in the simulation is the expected values 
of the realized random variable x, which correspond to the expected rank and 
the underlying values of the parameters specified by the user. The user 
inputs the the underlying value of a, the number of samples in the 
experiment, and a vector containing either the scheduled censor locations 
(fraction of mean) or the censor points based on the fractional number of 
realized data. The program computes the likelihood of the parameters for 
each censored data set. The parameter space is defined from 0.25 to 4 times 
the underlying value of alpha, and 0.05 to 150 times the underlying value of 
beta. The large range of beta is required when the underlying alpha is less 
than unity and only a small number of points are realized. 

The marginal distributions of the parameters are obtained by 
projecting the likelihood surface on the respective axes. The marginal 
distribution of alpha is then integrated and normalized by the resulting area. 
The information is computed based on the marginal distribution of alpha and 
stored corresponding to that particular censor location. The process is 
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repeated until the information has been calculated for each of the input 
censor points. 

2. Simulation Software 

% MATLAB Program to compute information based on time censored data. The user 
% definesthe input %values for the underlying alpha. The location parameter beta is % 
fixed at 100 for this particular grid. The vector tcensor defines when the data is % 
censored (fraction of the mean). The numerical values of information are 
% contained in the vector I. 

% 

% Developed by: LT James W. Coleman, USN 

% 

% Define underlying parameters and range of alphas 

% 

alpha l=input(’enter underlying value of alpha ’) 

alphamax=4*alpha 1 ; 

alphamin=alphal/4; 

astep=(alphamax-alphamin)/50; 

alpha=alphamin:astep:alphamax-astep; 

beta 1=100; 

% 

% User defines number of samples in data set 

% 

n=input(’enter number of samples in the data set ’) 

% 

% Determine the set of expected data, based on expected rank 

% 

for k=l:n,fxi(k)=k/(n+l);,end 
xi=beta 1 *(-log( 1 -fxi)). A ( 1/alpha 1 ); 
xi=sort(xi); 

qcensr=input(’enter 0 for scheduled censor, 1 for censor based on failures ’) 
if qcensr==0 

% 

7c Define the censor locations as a fraction of the mean 

7c 

tcensor=input(’enter censor points as fraction of mean life ’) 
lmax=length(tcensor); 
else 

xcb=input(’enter censor points as fractional realizations ’) 
lmax=length(xcb); 
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end 

% 

% Initialize variables and the likelihood 

% 

count=l; 
inc=l; 
flag 1=0; 
flag2=0; 

L=zeros(50,50); 
while count<=lmax 
if qcensr==0 
% 

% Determine the time of censor and use this value to define which points are exact 

% 

xc=tcensor(count)*betal 

index=max(find(xi<=xc)); 

xie=xi(l:index); 

c=length(xie) 

else 

% 

% Determine the exact data and the number of exact data 

% 

nreal=xcb(count)*(n); 
xie=xi(l:nreal); 
c=length(xie) 
if c<=l 
xc=xie; 
else 

xc=max(xie); 

end 

end 

beta=5:5:250; 
iter= 1 ; 

while iter<=3 

% 

% Compute likelihood (50x50 array) for censored data 
% 

L=zeros(50); 
if c~=0 
for i=l:50 
for j=l:50 
a=alpha(i); 
b=beta(j); 
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xib=xie/b; 

LI =((a/b) A c)*prod((xib. A (a- 1 )).*exp(-(xib A a))); 

L2=(exp(-(xc/b) A a)) A (n-c); 

L(ij)=Ll*L2; 

end 

end 

m=length(alpha); 

a=l; 

b=l; 

% 

% Obtain marginal distributions of alpha by summing along the betas 

% 

for u=l :50,ma(u)=sum(L(u,:));,end 
if iter==l,mal=ma’;,end 
if iter==2,ma2=ma’;,end 
if iter==3,ma3=ma’;,end 

% 

% Normalize alphas for a common integration range by dividing astep by 
% alpha 1 

% 

delta=astep/alphal 

% 

% Compute the area under the marginal distribution and normalize the values 

% 

area(iter)=numintg(m,a,b,delta.ma) 

ma=ma/area(iter); 

% 

% Compute the information based on the marginal distribution of alpha 

% 

info=zeros(50,l); 
if iter==l 
for u=l:50 
if ma(u)~=0.0 

info(u)=info(u)+ma(u)*log(ma(u)); 

end 

end 

l(count)=sum(info); 

end 

% This test is to determine if the likelihood is required to be computed for the higher 
% values of beta, such as used in iter=2 and 3. As more data is realized, the 
% likelihood surface narrows to the region where beta= 5:250; no longer requiring 
% the computation for high values of beta. The test compares the differences in the 
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% area from iter=2 to iter=l. If the area computed in iter=2 is very small compared % 
to iter=l, then additional computation of likelihood in that range of betas. 

7c 

if (iter==2)&(flag 1 ==0) 
diffa 1 2=area(2)/area( 1 ); 
if diffa 12<l.e-6, flag 1=1, end 
end 

if iter==3 
if flag2==0 

diffa 1 3=area(3)/area( 1 ); 
if diffa 13<l.e-6,flag2=l, end 
end 

% 

7c If all three iterations are required to define the likelihood, revise the previous value 
7c of 1 
7c 

info=zeros(50,l); 
mat=ma 1 +ma2+ma3; 
ma=maf ; 

7c 

7c Compute the area under the marginal distribution and normalize the values 

% 

areat=numintg(m,a,b,delta,ma); 

ma=ma/areat; 

% 

% Compute the information for this censor location 
% 

for u=l:50 
if ma(u)~=0.0 

info(u)=info(u)+ma(u)*log(ma(u)); 

end 

end 

I(count)=sum(info); 

end 

I 

else 

% 

7c If no data was realized, set I = 0 
7c 

l(count)=0; 

I 

end 

7c 
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% Check the flags to determine if large values of beta are required for the next 
% likelihood calcuation 

% 

if flag2==l,iter=3;,end 

if (flag 1 == 1 )&(iter==2),iter=3;,end 

iter=iter+l 

% 

% Define the regions for the large values of beta 

% 

if iter==2,beta=250: 1 00:5 1 50;,end 
if iter==3,beta=5 1 50:200: 1 4950;,end 

% 

% Initialize variables for next iteration 

% 

ma=zeros(l,50); 

mat=zeros(50,l); 

L=zeros(50,50); 

end 

% 

% Initialize variables for next iteration 

% 

count=count+l 

inc=inc+l; 

L=zeros(50,50); 

end 
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